Granularity refined by knowledge: contingency tables and rough sets as tools of discovery

نویسنده

  • Jan M. Zytkow
چکیده

Contingency tables represent data in a granular way and are a well-established tool for inductive generalization of knowledge from data. We show that the basic concepts of rough sets, such as concept approximation, indiscernibility, and reduct can be expressed in the language of contingency tables. We further demonstrate the relevance to rough sets theory of additional probabilistic information available in contingency tables and in particular of statistical tests of signiicance and predictive strength applied to contingency tables. Tests of both type can help the evaluation mechanisms used in inductive generalization based on rough sets. Granularity of attributes can be improved in feedback with knowledge discovered in data. We demonstrate how 49er's facilities for 1 contingency table reenement, for 2 column and row grouping based on correspondence analysis, and 3 the search for equivalence relations between attributes improve both granularization of attributes and the quality o f k n o wledge. Finally we demonstrate the limitations of knowledge viewed as concept approximation, which is the focus of rough sets. Transcending that focus and reorienting towards the predictive knowledge and towards the related distinction between possible and impossible or statistically improbable situations will be very useful in expanding the rough sets approach to more expressive forms of knowledge. 1. ROUGH SETS REPRESENTATION BY CONTINGENCY TABLES Both rough sets and contingency tables are founded on a similar idea of granular empirical data. Both approaches use the representation of empirical objects by n-tuples or vector of attribute values. This representation is common in statistics, databases, machine learning, pattern recognition and many other areas. Speciic to rough sets and contingency tables approaches are indiscernibility classes. Objects are indiscernible if their property tuples are the same. That can happen in practice when domains of attributes contain small numbers of values. That happens in many databases, especially when attribute values are determined with limited accuracy. But if that is not the case, binning also known as discretization, can be used. Rough sets were introduced in 1980's by Zdzislaw P awlak 1991, while the history of contingency tables is several decades older. They play a major role in the theory and practice of statistics Contingency tables have been used as a general tool for expressing knowledge and for knowledge reenement in 49er Zembowicz and Zytkow, 1993, 1996. In this paper we place rough sets in the framework of contingency tables. We argue that contingency tables provide an added …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rough Sets and Granular Computing in Geospatial Information

The representation of geographic entities is characterized by inherent granularity due to scale and resolution specific observations. This article discusses the various aspects of rough set-based approximation modeling of spatial and conceptual granularity. It outlines the context and applications of rough set theory in representing objects with intermediate boundaries, spatial reasoning and kn...

متن کامل

Application of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)

Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...

متن کامل

Heuristic Knowledge Discovery 1 Running head: Heuristic Knowledge Discovery, Genetic Algorithms and Rough Sets Heuristic Knowledge Discovery for Archaeological Data Using Genetic Algorithms and Rough Sets

The goal for of this research is to investigate and develop heuristic tools in order to extract meaningful knowledge from archeological large-scale data sets. Database queries help us to answer only simple questions. Intelligent search tools integrate heuristics with knowledge discovery tools and they use data to build models of the real world. We would like to investigate these tools and combi...

متن کامل

Granularity of Knowledge, Indiscernibility and Rough Sets

Granularity of knowledge attracted attention of many researchers recently. This paper concerns this issue from the rough set perspective. Granularity is inherently connected with foundation of rough set theory. The concept of the rough set hinges on classification of objects of interest into similarity classes, which form elementary building blocks (atoms, granules) of knowledge. These granules...

متن کامل

Identification and Evaluation of Functional Dependency Analysis using Rough sets for Knowledge Discovery

The process of data acquisition gained momentum due to the efficient representation of storage/retrieving systems. Due to the commercial and application value of these stored data, Database Management has become essential for the reasons like consistency and atomicity in giving birth to DBMS. The existing database management systems cannot provide the needed information when the data is not con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000